Improved ROI and within frame discriminant features for lipreading

نویسندگان

  • Gerasimos Potamianos
  • Chalapathy Neti
چکیده

We study three aspects of designing appearance based visual features for automatic lipreading: (a) The choice of the video region of interest (ROI), on which image transform features are obtained; (b) The extraction of speech discriminant features at each frame; and (c) The use of temporal information to improve visual speech modeling. In particular, with respect to (a), we propose a ROI that includes the speaker’s jaw and cheeks, in addition to the traditionally used mouth/lip region; with respect to (b) and (c), we propose the use of a two-stage linear discriminant analysis, both within frame, as well as across a large number of frames. On a largevocabulary, continuous speech audio-visual database, the proposed visual features result in a 13% absolute reduction in visual-only word error rate over a baseline visual front end, and in an additional 28% relative improvement in audio-visual over audio-only phonetic classification accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LipNet: End-to-End Sentence-level Lipreading

Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rat...

متن کامل

Palmprint Image Processing and Linear Discriminant Analysis Method

In this paper, the method of processing and linear discriminant analysis of palmprint image is proposed. The palmprint image processing focuses on the location and segmentation which involves rotation and transition. By means of finding the two locate points about the index finger and middle finger, ring finger and little finger, the palmprint image is rotated and corrected a new coordinate sys...

متن کامل

A Cascade Image Transform for Speaker Independent Automatic Speech Reading

We propose a three-stage pixel based visual front end for automatic speechreading (lipreading) that results in improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied to a three-dimensional video region of interest that contains the speaker’s mouth area. The first stage is a typical image compression transform that achieves a...

متن کامل

A Cascade Visual Front End for Speaker Independent Automatic Speechreading

We propose a three-stage pixel based visual front end for automatic speechreading (lipreading) that results in signi cantly improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied on a three-dimensional video region-of-interest that contains the speaker's mouth area. The rst stage is a typical image compression transform that...

متن کامل

Impact of PET - CT motion correction in minimising the gross tumour volume in non-small cell lung cancer

AbstractObjective: To investigate the impact of respiratory motion on localization, and quantification lung lesions for the Gross Tumour Volume utilizing an in-house developed Auto3Dreg programme and dynamic NURBS-based cardiac-torso digitised phantom (NCAT). Methods: Respiratory motion may result in more than 30% underestimation of the SUV values of lung, liver and kidney tumour lesions. The m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001